Manifold Learning for Human Population Structure Studies

نویسندگان

  • Hoicheong Siu
  • Li Jin
  • Momiao Xiong
چکیده

The dimension of the population genetics data produced by next-generation sequencing platforms is extremely high. However, the "intrinsic dimensionality" of sequence data, which determines the structure of populations, is much lower. This motivates us to use locally linear embedding (LLE) which projects high dimensional genomic data into low dimensional, neighborhood preserving embedding, as a general framework for population structure and historical inference. To facilitate application of the LLE to population genetic analysis, we systematically investigate several important properties of the LLE and reveal the connection between the LLE and principal component analysis (PCA). Identifying a set of markers and genomic regions which could be used for population structure analysis will provide invaluable information for population genetics and association studies. In addition to identifying the LLE-correlated or PCA-correlated structure informative marker, we have developed a new statistic that integrates genomic information content in a genomic region for collectively studying its association with the population structure and LASSO algorithm to search such regions across the genomes. We applied the developed methodologies to a low coverage pilot dataset in the 1000 Genomes Project and a PHASE III Mexico dataset of the HapMap. We observed that 25.1%, 44.9% and 21.4% of the common variants and 89.2%, 92.4% and 75.1% of the rare variants were the LLE-correlated markers in CEU, YRI and ASI, respectively. This showed that rare variants, which are often private to specific populations, have much higher power to identify population substructure than common variants. The preliminary results demonstrated that next generation sequencing offers a rich resources and LLE provide a powerful tool for population structure analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Intellectual Structure of Knowledge in the Field of Distance Education Using the Co-Word analyses

Background: Co- word analysis is one of the content analysis methods used in scientometric studies and mapping the scientific structure of various fields. The purpose of the present research is to map the structure of distance education using the co-word analysis. Methods: The research method is content analysis using co- word analysis. The research population are 31607 documents indexed in the...

متن کامل

بهبود مدل تفکیک‌کننده منیفلدهای غیرخطی به‌منظور بازشناسی چهره با یک تصویر از هر فرد

Manifold learning is a dimension reduction method for extracting nonlinear structures of high-dimensional data. Many methods have been introduced for this purpose. Most of these methods usually extract a global manifold for data. However, in many real-world problems, there is not only one global manifold, but also additional information about the objects is shared by a large number of manifolds...

متن کامل

Graphic analysis of population structure on genome-wide rheumatoid arthritis data

Principal-component analysis (PCA) has been used for decades to summarize the human genetic variation across geographic regions and to infer population migration history. Reduction of spurious associations due to population structure is crucial for the success of disease association studies. Recently, PCA has also become a popular method for detecting population structure and correction of popu...

متن کامل

Learning and Combining Image Similarities for Neonatal Brain Population Studies

The characterization of neurodevelopment is challenging due to the complex structural changes of the brain in early childhood. To analyze the changes in a population across time and to relate them with clinical information, manifold learning techniques can be applied. The neighborhood definition used for constructing manifold representations of the population is crucial for preserving the simil...

متن کامل

Thermomechanical Analysis of Exhaust Manifold and Catalyst for with Fluid Structure Interaction

Design of exhaust manifold is important due to its effects on performance of catalyst, thermal and thermo- mechanical loads. In this paper, first transient analysis of fluid was carried out with FLUENT software, then time-average temperature and heat transfer coefficient contour for three cycle were mapped on inner surface of exhaust manifold with MATLAB software for thermomechanical analysis. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012